Clustering with Bregman Divergences

نویسندگان

  • Arindam Banerjee
  • Srujana Merugu
  • Inderjit S. Dhillon
  • Joydeep Ghosh
چکیده

A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy, have been used for clustering. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans, the Linde-Buzo-Gray (LBG) algorithm and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the method to a large class of clustering loss functions. This is achieved by first posing the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate distortion theory, and then deriving an iterative algorithm that monotonically decreases this loss. In addition, we show that there is a bijection between regular exponential families and a large class of Bregman divergences, that we call regular Bregman divergences. This result enables the development of an alternative interpretation of an efficient EM scheme for learning mixtures of exponential family distributions, and leads to a simple soft clustering algorithm for regular Bregman divergences. Finally, we discuss the connection between rate distortion theory and Bregman clustering and present an information theoretic analysis of Bregman clustering algorithms in terms of a trade-off between compression and loss in Bregman information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-flat Clusteringwhith Alpha-divergences

The scope of the well-known k-means algorithm has been broadly extended with some recent results: first, the kmeans++ initialization method gives some approximation guarantees; second, the Bregman k-means algorithm generalizes the classical algorithm to the large family of Bregman divergences. The Bregman seeding framework combines approximation guarantees with Bregman divergences. We present h...

متن کامل

Submodular-Bregman and the Lovász-Bregman Divergences with Applications: Extended Version

We introduce a class of discrete divergences on sets (equivalently binary vectors)that we call the submodular-Bregman divergences. We consider two kinds ofsubmodular Bregman divergence, defined either from tight modular upper or tightmodular lower bounds of a submodular function. We show that the properties ofthese divergences are analogous to the (standard continuous) Bregman d...

متن کامل

Agglomerative Bregman Clustering

This manuscript develops the theory of agglomerative clustering with Bregman divergences. Geometric smoothing techniques are developed to deal with degenerate clusters. To allow for cluster models based on exponential families with overcomplete representations, Bregman divergences are developed for nondifferentiable convex functions.

متن کامل

Clustering with Bregman Divergences: an Asymptotic Analysis

Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the lim...

متن کامل

Symmetrized Bregman Divergences and Metrics

While Bregman divergences [3] have been used for several machine learning problems in recent years, the facts that they are asymmetric and does not satisfy triangle inequality have been a major limitation. In this paper, we investigate the relationship between two families of symmetrized Bregman divergences and metrics, which satisfy the triangle inequality. Further, we investigate kmeans-type ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2004